We will use functions from packages base, utils, and stats (pre-installed and pre-loaded)
We will also use the packages below (specifying package::function for clarity).
# Load them for this R session# General library(fs) # file/directory interactionslibrary(here) # tools find your project's files, based on working directory
here() starts at /Users/luisamimmi/Github/R4biostats
library(paint) # paint data.frames summaries in colourlibrary(janitor) # tools for examining and cleaning data
Attaching package: 'janitor'
The following objects are masked from 'package:stats':
chisq.test, fisher.test
library(dplyr) # {tidyverse} tools for manipulating and summarizing tidy data
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(forcats) # {tidyverse} tool for handling factorslibrary(openxlsx) # Read, Write and Edit xlsx Fileslibrary(flextable) # Functions for Tabular Reporting# Statisticslibrary(rstatix) # Pipe-Friendly Framework for Basic Statistical Tests
Attaching package: 'rstatix'
The following object is masked from 'package:janitor':
make_clean_names
The following object is masked from 'package:stats':
filter
library(lmtest) # Testing Linear Regression Models # Testing Linear Regression Models
Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
library(broom) # Convert Statistical Objects into Tidy Tibbleslibrary(tidymodels) # not installed on this machine
Name: NHANES (National Health and Nutrition Examination Survey) combines interviews and physical examinations to assess the health and nutritional status of adults and children in the United States. Sterted in the 1960s, it became a continuous program in 1999. Documentation: dataset1 Sampling details: Here we use a sample of 500 adults from NHANES 2009-2010 & 2011-2012 (nhanes.samp.adult.500 in the R oibiostat package, which has been adjusted so that it can be viewed as a random sample of the US population)
Adapting the function here to match your own folder structure
We can start looking at how the model performs by applying it to our nhanes_test sub-sample, utilizing the function predict
Linear regression performance: predicted values in test sample
We can look the 95% CI of any predicted values
We can look the CI 95% of a single predicted values
Linear regression performance: RMSE
Basically we are asking: “how does the prediction compare to the actual test dataset?”
For this we take the difference between the predicted and the actual value as
RMSE = Root Means Squared Error
This is quite close to the Residual standard error that we got from the regression model summary (6.843) – despite that was taken from training data and this comes from testing data